AITopics | open-weight model

Collaborating Authors

open-weight model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

877b40688e330a0e2a3fc24084208dfa-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-16-2026, 09:56:51 GMT

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology (0.92)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Software (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(5 more...)

Add feedback

What's next for Chinese open-source AI

MIT Technology ReviewFeb-12-2026, 10:00:00 GMT

Chinese open models are spreading fast, from Hugging Face to Silicon Valley. In this photo illustration, the DeepSeek apps is seen on a phone in front of a flag of China on January 28, 2025 in Hong Kong, China. The past year has marked a turning point for Chinese AI. Since DeepSeek released its R1 reasoning model in January 2025, Chinese companies have repeatedly delivered AI models that match the performance of leading Western models at a fraction of the cost. Just last week the Chinese firm Moonshot AI released its latest open-weight model, Kimi K2.5, which came close to top proprietary systems such as Anthropic's Claude Opus on some early benchmarks. The difference: K2.5 is roughly one-seventh Opus's price.

large language model, machine learning, natural language, (18 more...)

MIT Technology Review

Country:

North America > United States > California (0.25)
Asia > China > Hong Kong (0.25)

Industry:

Information Technology (1.00)
Banking & Finance (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

So Long, GPT-5. Hello, Qwen

WIREDDec-27-2025, 11:00:00 GMT

In the AI boom, chatbots and GPTs come and go quickly. On a drizzly and windswept afternoon this summer, I visited the headquarters of Rokid, a startup developing smart glasses in Hangzhou, China. As I chatted with engineers, their words were swiftly translated from Mandarin to English, and then transcribed onto a tiny translucent screen just above my right eye using one of the company's new prototype devices. Rokid's high-tech spectacles use Qwen, an open-weight large language model developed by the Chinese ecommerce giant Alibaba. OpenAI's GPT-5, Google's Gemini 3, and Anthropic's Claude often score higher on benchmarks designed to gauge different dimensions of machine cleverness.

ai model, gpt-5, qwen, (15 more...)

WIRED

Country:

Asia > China > Zhejiang Province > Hangzhou (0.25)
North America > United States > Michigan (0.05)
North America > United States > California (0.05)
(2 more...)

Industry: Information Technology > Services (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Quantitative Analysis of Technical Debt and Pattern Violation in Large Language Model Architectures

Slater, Tyler

arXiv.org Artificial IntelligenceDec-5-2025

As Large Language Models (LLMs) transition from code completion tools to autonomous system architects, their impact on long-term software maintainability remains unquantified. While existing research benchmarks functional correctness (pass@k), this study presents the first empirical framework to measure "Architectural Erosion" and the accumulation of Technical Debt in AI-synthesized microservices. We conducted a comparative pilot study of three state-of-the-art models (GPT-5.1, Claude 4.5 Sonnet, and Llama 3 8B) by prompting them to implement a standardized Book Lending Microservice under strict Hexagonal Architecture constraints. Utilizing Abstract Syntax Tree (AST) parsing, we find that while proprietary models achieve high architectural conformance (0% violation rate for GPT-5.1), open-weights models exhibit critical divergence. Specifically, Llama 3 demonstrated an 80% Architectural Violation Rate, frequently bypassing interface adapters to create illegal circular dependencies between Domain and Infrastructure layers. Furthermore, we identified a phenomenon of "Implementation Laziness," where open-weights models generated 60% fewer Logical Lines of Code (LLOC) than their proprietary counterparts, effectively omitting complex business logic to satisfy token constraints. These findings suggest that without automated architectural linting, utilizing smaller open-weights models for system scaffolding accelerates the accumulation of structural technical debt.

large language model, llama 3, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2512.04273

Country: North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Price of Progress: Algorithmic Efficiency and the Falling Cost of AI Inference

Gundlach, Hans, Lynch, Jayson, Mertens, Matthias, Thompson, Neil

arXiv.org Artificial IntelligenceDec-1-2025

Language models have seen enormous progress on advanced benchmarks in recent years, but much of this progress has only been possible by using more costly models. Benchmarks may therefore present a warped picture of progress in practical capabilities per dollar. To remedy this, we use data from Artificial Analysis and Epoch AI to form the largest dataset of current and historical prices to run benchmarks to date. We find that the price for a given level of benchmark performance has decreased remarkably fast, around $5\times$ to $10\times$ per year, for frontier models on knowledge, reasoning, math, and software engineering benchmarks. These reductions in the cost of AI inference are due to economic forces, hardware efficiency improvements, and algorithmic efficiency improvements. Isolating out open models to control for competition effects and dividing by hardware price declines, we estimate that algorithmic efficiency progress is around $3\times$ per year. Finally, we recommend that evaluators both publicize and take into account the price of benchmarking as an essential part of measuring the real-world impact of AI.

benchmark, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2511.23455

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

Geometry of Decision Making in Language Models

Joshi, Abhinav, Bhatt, Divyanshu, Modi, Ashutosh

arXiv.org Artificial IntelligenceNov-26-2025

Large Language Models (LLMs) show strong generalization across diverse tasks, yet the internal decision-making processes behind their predictions remain opaque. In this work, we study the geometry of hidden representations in LLMs through the lens of \textit{intrinsic dimension} (ID), focusing specifically on decision-making dynamics in a multiple-choice question answering (MCQA) setting. We perform a large-scale study, with 28 open-weight transformer models and estimate ID across layers using multiple estimators, while also quantifying per-layer performance on MCQA tasks. Our findings reveal a consistent ID pattern across models: early layers operate on low-dimensional manifolds, middle layers expand this space, and later layers compress it again, converging to decision-relevant representations. Together, these results suggest LLMs implicitly learn to project linguistic inputs onto structured, low-dimensional manifolds aligned with task-specific decisions, providing new geometric insights into how generalization and reasoning emerge in language models.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.20315

Country:

Asia > Middle East (0.67)
North America > United States > Minnesota (0.27)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.48)
Leisure & Entertainment > Sports (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Devil in the Details: Emergent Misalignment, Format and Coherence in Open-Weights LLMs

Dickson, Craig

arXiv.org Artificial IntelligenceNov-26-2025

Prior work has shown that fine-tuning models on a narrow domain with misaligned data can lead to broad misalignment - a phenomenon termed "emergent misalignment" (Betley et al. 2025). While all tested models were susceptible to emergent misalignment, some models showed more resistance than others. Specifically the Qwen-2.5 family proved to be relatively resistant, while GPT-4o exhibited the strongest misalignment. In this paper we evaluate if current-generation open-weights models exhibit similar resistance to the Qwen-2.5 family and measure misalignment robustness over a range of model architectures and scales. We replicate the effect across nine modern open-weights models (Gemma 3 and Qwen 3 families, 1B-32B parameters). Models fine-tuned on insecure code generation show a 0.68% misalignment rate (compared to 0.07% for base models), matching the lower end of prior open-model results but dramatically lower than GPT-4o's 20%. We identify a critical format-dependent vulnerability: requiring JSON output doubles misalignment rates compared to natural language prompts (0.96% vs 0.42%). This suggests that structural constraints may bypass safety training by reducing the model's 'degrees of freedom' to refuse. These findings confirm emergent misalignment as a reproducible phenomenon in modern open-weights models, with rates substantially lower than observed in proprietary systems.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.20104

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.67)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

The US Needs an Open Source AI Intervention to Beat China

WIREDNov-19-2025, 19:00:00 GMT

Depending on foreign-made open models is both a supply chain risk and an innovation problem, experts say. Since 2022, America has had a solid lead in artificial intelligence thanks to advanced models from high-flying companies like OpenAI, Google DeepMind, Anthropic, and xAI. A growing number of experts, however, worry that the US is starting to fall behind when it comes to minting open-weight AI models that can be downloaded, adapted, and run locally. Open models from Chinese companies like Kimi, Z.ai, Alibaba, and DeepSeek are now rapidly gaining popularity among researchers and engineers worldwide, leaving the US as a laggard in an increasingly vital area of AI innovation. "The US needs open models to cement its lead at every level of the AI stack," Nathan Lambert, founder of the ATOM (American Truly Open Models) Project, tells WIRED.

large language model, machine learning, natural language, (19 more...)

WIRED

Country: North America > United States (1.00)

Industry:

Information Technology (1.00)
Government > Regional Government > North America Government > United States Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

OpenAI's Open-Weight Models Are Coming to the US Military

WIREDNov-13-2025, 11:00:00 GMT

OpenAI's Open-Weight Models Are Coming to the US Military The gpt-oss models are being tested for use on sensitive military computers. But some defense insiders say that OpenAI is still behind the competition. When OpenAI unveiled its first open-weight models in years this August, it wasn't just tech companies that were paying attention. The release also excited US military and defense contractors, which saw a chance to use them for highly secure operations. Initial results show that OpenAI's tools lag behind competitors in desired capabilities, some military vendors tell WIRED.

large language model, machine learning, natural language, (18 more...)

WIRED

Country:

North America > United States > Virginia (0.14)
North America > United States > California (0.14)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique

Zeng, Gailun, Luo, Ziyang, Lin, Hongzhan, Tian, Yuchen, Li, Kaixin, Gong, Ziyang, Guo, Jianxiong, Ma, Jing

arXiv.org Artificial IntelligenceNov-13-2025

The ability of critique is vital for models to self-improve and serve as reliable AI assistants. While extensively studied in language-only settings, multimodal critique of Large Multimodal Models (LMMs) remains underexplored despite their growing capabilities in tasks like captioning and visual reasoning. In this work, we introduce MM-CRITIC, a holistic benchmark for evaluating the critique ability of LMMs across multiple dimensions: basic, correction, and comparison. Covering 8 main task types and over 500 tasks, MM-CRITIC collects responses from various LMMs with different model sizes and is composed of 4471 samples. To enhance the evaluation reliability, we integrate expert-informed ground answers into scoring rubrics that guide GPT-4o in annotating responses and generating reference critiques, which serve as anchors for trustworthy judgments. Extensive experiments validate the effectiveness of MM-CRITIC and provide a comprehensive assessment of leading LMMs' critique capabilities under multiple dimensions. Further analysis reveals some key insights, including the correlation between response quality and critique, and varying critique difficulty across evaluation dimensions. Our code is available at https://github.com/MichealZeng0420/MM-Critic.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.09067

Country: Asia > China (0.68)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback